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Abstract. - A large number of complex networks, both natural and artificial, share the 
presence of highly heterogeneous, scale-free degree distributions. A few mechanisms for the 
emergence of such patterns have been suggested, optimization not being one of them. In this 
letter we present the first evidence for the emergence of scaling (and the presence of small 
world behavior) in software architecture graphs from a well-defined local optimization process. 
Although the rules that define the strategies involved in software engineering should lead to 
a tree-like structure, the final net is scale-free, perhaps reflecting the presence of conflicting 
constraints unavoidable in a multidimensional optimization process. The consequences for 
other complex networks are outlined. 



Two basic features common to many complex networks, from the Internet to metabolic 
nets, are their scale- free (SF) topology and a small- world (SW) structure ||. The 
first states that the proportion of nodes P{k) having k links decays as a power law P{k) ^ 
k-''(j){k/^) (with 7 « 2 - 3) |,|,| (here (j){k/^) introduces a cut-off at some characteristic 
scale ^). Examples of SF nets include Internet topology cellular networks |@,||, scientific 
collaborations and lexical networks. The second refers to a web exhibiting very small 
average path lengths between nodes along with a large clustering 

Although it has been suggested that these nets originate from preferential attachment B, 
the success of theoretical approximations to branching nets from optimization theory |pT|,|l^ 
would support optimality as an alternative scenario. In this context, it has been shown that 
minimization of both vertex- vertex distance and link length (i.e. Euclidean distance between 
vertices) can lead to the SW phenomenon. In a similar context, SF networks have been 
shown to originate from a simultaneous minimization of link density and path distance jl^ . 
Optimal wiring has also been proposed within the context of neural maps 'save wiring' 
is an organizing principle of brain structure. However, although the analysis of functional 
connectivity in the cerebral cortex has shown evidence for SW [ p^ , the degree distribution is 
clearly non-skewed but single-scaled (i. e. ^ is very small). 

The origin of highly heterogeneous nets is particularly important since it has been shown 
that these networks are extremely resilient under random failure: removal of randomly chosen 
nodes (tipically displaying low degree) seldom alters the fitness of the net However, when 
nodes are removed by sequentially eliminating those with higher degree, the system rapidly 
experiences network fragmentation [^|l8|. 

© EDP Sciences 



2 



EUROPHYSICS LETTERS 




Fig. 1 - (a) One of the largest components of the Java net {Q2, displays scale-free and small world 
behavior (see text). In (b) the cumulative frequencies P> (fc) are shown for the two largest components. 
We have P>(it) ~ with 71 = 1.5 ± 0.05 and 72 = 1.65 ± 0.08. 



Artificial networks offer an invaluable reference when dealing with the rules that underlie 
their building process ||ig| ] . Here we show that a very important class of networks derived from 
software architecture maps, displays the previous patterns as a result of a design optimization 
process. 

The importance of software and understanding how to build efficiently software systems 
is one of our major concerns. Software is present in the core of scientific research, economic 
markets, military equipments and health care systems, to name a few. Expensive costs (thou- 
sands of millions of dollars) are associated with the software development process. In the past 
30 years we have assisted to the birth and technological evolution of software engineering, 
whose objective is to provide methodologies and tools to control and build software efficiently. 
Software engineers conceive programs with graphs as architects use plans for buildings. The 
software architecture is the structure of the program. The building blocks are software com- 
ponents and links are relationships between software components. The interactions between 
all the components yields the program functionality. Class diagrams constitute a well-known 
example of such graphs |Q . In this case, software components are also known by the technical 
term class. We have analysed the class diagram of the public Java Development Framework 1.2 
(JDK1.2) |2l[], which is a large set of software components widely used by Java applications, 
as well as the architecture of a large computer game ||2^ . 

These are examples of highly optimized structures, where design principles call for diagram 
comprehensibility, grouping components into modules , flexibility and reusability (i.e. avoiding 
the same task to be performed by different components) |^^. Although the entire plan is 
controlled by engineers, no design principle explicitely introduces preferential attachment nor 
scaling and small- worldness. The resulting graphs, however, turn out to be SW and SF nets. 

The software graph is defined by a pair = {Ws,Es), where Wg — {si}, {i — 1, N) is 
the set of = |f2| classes and Eg = {{s^, Sj}} is the set of edges/connections between classes. 
The adjacency matrix indicates that an interaction exists between classes Si,Sj £ fig 
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i^ij — 1) or that the interaction is absent (^^ 0). The average path lenght I is given by the 
average I = {lmin{hi)) O'^^r aU pairs Si,Sj £ fi^, where Iminii,.]) indicates the length of the 
shortest path between two nodes. The clustering coefficient is defined as the probability that 
two classes that are neighbors of a given class are neighbors of each other. Poissonian graphs 
with an average degree k are such that C ~ k/N and the path length follows 

log(^) ^' 
C is easily defined from the adjacency matrix, and is given by: 

(2) 



It provides a measure of the average fraction of pairs of neighbors of a node that are also 
neighbors of each other. 

The building process of a software graph is done in parallel (different parts are build and 
gradually get connected) and is assumed to follow some standard rules of design |^,|2^ . None 
of these rules refer to the overall organization of the final graph. Essentially, they deal with 
optimal communication among modules and low cost (in terms of wiring) together with the 
rule of avoiding hubs (classes with large number of dependencies, that is, large degree). The 
set of bad design practices, such as making use of large hubs, is known as antipatterns in 
the software literature: see p4]. The development time of the application should be as short 



as possible because the expensive costs involved. It is argued in literature |23| that there is 
an optimum number of components so that cost of development is minimized, but it is not 
possible to make a reliable prediction about this number. Adding new software components 
involves more cost in terms of interconnections between them (links). Conversely, the cost per 
single software component decreases as the overall number of components (nodes) is increased 
because the functionality is spread over the entire system. Intuitively, a trade-off between the 
number of nodes and the number of links must be chosen. 

However, we have found that this (local) optimization process results in a net that exhibits 
both scaling and small-world structure. First, we analyzed JDK1.2 network has N = 9257 
nodes and Nc = 3115 connected components, so that the complete graph Qg is actually given 
by = Uiili, where the set is ordered from larger to smaller components > \^2\ > 

... > lilAT^I). The largest connected component, fii, has Ni = 1376, with < k >— 3.16 
and 7 = 2.5, with clustering coefficient [4] is C = 0.06 > C""''^ = 0.002 and the average 
distance I = 6.39 « Z™"'' = 6.28, i.e. it is a small-world. The same basic results are 
obtained for 0.2 (shown in fig. la): here we have N2 ~ 1364, < k 2.83 and 7 = 2.65, 
C = 0.08 > C"^""'' 0.002 and I = 6.91 « Z™"'* = 6.82. 

The degree distribution for the two largest components is shown in figure lb, where we 
have represented the cumulative distribution 

p>{k;n,)^ ^ p{k\n,) (3) 

k'>k 

for i — 1,2. We can see that the largest components display scaling, with estimated exponents 
7^2.5 -2.65. 

Similar results have been obtained from the analysis of a computer game graph . This 
is a single, complex piece of software which consists of TV = 1989 classes involving different 
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Fig. 2 - (a) Using the 32 connected components with more than 10 classes (nodes), the nog(fc) — A'^ 
plots is shown. As predicted from a SW structure, the components follow a straight line in this 
hnear-log diagram. Three subwebs are shown (c-d), displaying hubs but no clustering (their location 
is indicated in (a)). The black square corresponds to the computer game graph. 



aspects like: real-time computer graphics, rigid body simulation, sound and music playing, 
graphical user interface and memory management. The software architecture graph for the 
game has a large connected component that relates all subsystems. The cumulative degree 
frequency for the entire system is scale- free, with 7 = 2. 85 ±0.11. The network also displays 
SW behaviour: the clustering coefhcient is C = 0.08 3> C'^"'' = 0.002 and the average 
distance is I = 6.2, close to T'^"'' = 4.84. 

These results reveal a previously unreported global feature of software architecture which 
can have important consequences in both technology and biology. This is, as far as we know, 
the first example of a scale-free graph resulting from a local optimization process instead 
of preferential attachment or duplication-rewiring 26| rules. Since the failure of a 
single module leads to system's breakdown, no global homeostasis has been at work as an 
evolutionary principle, as it might have occured in cellular nets. In spite of this, the final 
structure is very similar to those reported from the analysis of cellular networks. Second, our 
results suggest that optimization processes might be also at work in the latest, as it has been 
shown to occur in transport nets |pT|| . 

Complex biosystems are often assumed to result from selection processes together with 
a large amount of tinkering [ p7| . By contrast, it is often assumed that engineered, artificial 
systems are highly optimized entities, although selection would be also at work |2|]. Such 
differences should be observable when comparing both types, but the analysis of both natural 
and artificial nets indicates that they are often remarkably similar, perhaps suggesting general 
organization principles. Our results support an alternative scenario to preferential attachment 
based on cost minimization together with optimal communication among units jl^ process. 
The fact that small-sized software graphs are trees (as one would expect from optimization 
leading to hierarchical structures, leading to stochastic Cayley trees ||^) but that clustering 
emerges at larger sizes might be the outcome of a combinatorial optimization process: As the 
number of modules increases, the conflicting constraints that arise among different parts of 
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the system would prevent reaching an optimal structure pOj . Concerning cellular networks, 
although preferential linking might have been at work |30[ , optimization has probably played 
a key role in shaping metabolic pathways [^-^3|. We conjecture that the common origin of 
SF nets in both cellular and artificial systems such as software might stem from a process of 
optimization involving low cost (sparse graph) and short paths. For cellular nets (but not in 
their artificial counterparts) the resulting graph includes, for free, an enormous homeostasis 
against random failure. 
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